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(5)  Introduction 

In  the  United  States,  breast  cancer  is  the  leading  cause  of  death  in  women  between  40  to  55  years 
of  age[l].  It  is  estimated  that  one  out  of  eight  women  will  develop  breast  cancer  in  their  lifetime  [2,  3]. 
There  is  considerable  evidence  that  early  diagnosis  and  treatment  significantly  improves  the  chance  of 
survival  for  patients  with  breast  cancer  [4-9].  The  American  Cancer  Society  —  National  Cancer 
Institute  Breast  Cancer  Detection  Demonstration  Project  (BCDDP)  has  shown  that  mammography 
contributes  significantly  in  the  detection  of  localized  breast  cancer  in  asymptomatic  women  [9]. 

Although  mammography  has  a  high  sensitivity  for  detection  of  breast  cancers  when  compared  to 
other  diagnostic  modalities,  studies  indicate  that  radiologists  do  not  detect  all  carcinomas  that  are  visible 
on  retrospective  analyses  of  the  images  [8,  10-18].  While  double  reading  can  reduce  the  miss  rate  in 
radiographic  reading  [19,  20],  it  also  increases  the  cost  of  screening.  In  our  ROC  study  [21],  we  found 
that  a  CAD  scheme,  which  alerts  the  radiologist  to  suspicious  clusters  of  microcalcifications,  can 
significantly  improve  radiologists'  accuracy  in  detecting  the  microcalcifications  under  experimental 
conditions  that  simulate  the  rapid  interpretation  of  screening  mammograms.  More  recently,  Kegelmeyer 
et  al.  [22]  also  showed  that  CAD  can  improve  radiologists'  detection  of  spiculated  masses.  These  studies 
indicate  that  CAD  is  a  viable  alternative  to  double  reading  by  radiologists. 

Early  breast  cancers  are  often  characterized  by  subtle  clustered  microcalcifications  and  masses 
[23],  It  has  been  reported  that  between  30  and  50%  of  breast  carcinomas  detected  radiographically 
demonstrate  microcalcifications  on  mammograms,  and  40  to  50%  of  breast  carcinomas  present  as 
masses.  The  high  correlation  between  the  presence  of  microcalcifications  and  masses  and  the  presence 
of  breast  cancers  indicates  that  an  increase  in  the  accuracy  of  detection  and  analysis  of  the  characteristic 
features  of  these  lesions  may  lead  to  further  improvement  in  the  efficacy  of  mammography  as  a 
screening  procedure  for  the  detection  of  early  breast  cancer. 

In  the  past  few  years,  we  have  been  developing  CAD  algorithms  in  detection  and  classification  of 
microcalcifications  and  masses  using  advanced  image  processing  and  computer  vision  techniques.  Our 
CAD  algorithms  have  provided  very  promising  results  in  laboratory  tests.  At  this  stage,  it  is  necessary 
to  test  the  algorithms  in  a  clinical  trial  with  a  large  number  of  mammograms  obtained  from  the  gener^ 
patient  population  before  specific  methods  can  be  developed  to  further  improve  their  performance. 
Therefore,  our  goals  in  this  proposal  are  to  implement  our  CAD  algorithms  in  a  fast  workstation, 
develop  user  interfaces  for  efficient  operation  of  the  CAD  programs,  and  conduct  a  pilot  clinical  trial  of 
the  CAD  schemes  at  three  mammographic  screening  sites.  Based  on  the  results  of  the  pilot  clinical  trial, 
we  can  evaluate  the  sensitivity  and  specificity  of  the  CAD  algorithms,  analyze  the  effects  of  the  CAD 
schemes  on  mammographic  screening,  identify  any  potential  problems  in  a  clinical  environment,  and 
develop  methods  to  further  improve  the  CAD  schemes  in  the  future.  We  believe  that  this  is  a  crucial 
step  to  develop  a  clinically  practical  CAD  workstation. 

It  has  been  recognized  that  digital  mammography  is  one  of  the  key  research  areas  for 
improvement  in  the  diagnosis  of  breast  cancer  [24].  Two  of  the  major  issues  in  digital  mammography 
are  the  technological  requirements  in  developing  high  resolution  digital  detectors  and  the  transmission 
and  archiving  the  large  amount  of  data.  A  number  of  solid-state  large-area  digital  detectors  are  being 
developed  for  mammographic  application.  It  has  been  generally  recognized  that  a  pixel  size  of  no 
greater  than  0.05  mm  ^  0.05  mm  will  be  required  for  imaging  the  subtle  features  of  microcalcifications. 
At  this  resolution,  a  single  8"  ^  10"  mammogram  will  result  in  40  MB  of  digital  data. 

Data  compression  can  reduce  the  amount  of  data  for  transmission  and  storage.  However,  there  is 
often  a  tradeoff  between  compression  ratio  and  image  fidelity.  Data  compression  in  mammography  is 
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especially  difficult  because  of  the  very  subtle  image  details  such  as  microcalcifications  and  mass 
margins  that  need  to  be  preserved.  We  have  investigated  the  effects  of  data  compression  on 
computerized  detection  of  microcalcifications  previously.  In  the  current  proposal,  we  plan  to  develop  a 
CAD  guided  data  compression  technique  to  maximize  the  compression  efficiency  with  a  minimum  loss 
of  information.  Our  approach  is  to  preserve  the  original  image  information  by  lossless  compression  in 
potentially  important  regions  on  the  mammograms  indicated  by  the  CAD  programs.  For  breast  areas 
outside  these  regions,  we  will  apply  the  most  efficient  lossy  compression  technique  that  does  not  cause 
noticeable  degradation  of  image  details.  We  will  conduct  both  receiver  operating  characteristic  studies 
and  subjective  image  quality  ranking  studies  to  compare  observer  performance  on  the  uncompressed 
images,  on  images  compressed  with  the  selected  lossy  technique,  and  on  images  compressed  with  the 
standard  JPEG  technique. 

The  importance  of  this  research  is  based  on  the  fact  that  x-ray  mammography  is,  at  present,  the 
most  reliable  diagnostic  procedure  for  detection  of  early  breast  cancer.  Our  proposed  research  aims  at 
the  development  of  a  CAD  workstation  which  may  assist  radiologists  in  screening  and  characterizing 
abnormalities  on  mammograms  and  the  development  of  an  efficient  CAD-guided  data  compression 
technique  for  digital  mammography.  The  CAD  workstation,  once  developed,  can  be  implemented  and 
operated  cost-effectively  in  various  breast  imaging  facilities  as  a  second  opinion,'  and  thus  will 
potentially  increase  the  diagnostic  accuracy  of  mammography  for  breast  cancer  detection.  The  data 
compression  technique  will  facilitate  the  implementation  of  telemammography  and  digital 
mammography  for  breast  cancer  screening.  These  new  technologies  therefore  are  expected  to  have  a 
significant  impact  on  patient  care,  especially  in  rural  and  remote  areas. 
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(6)  Body 

During  the  funding  period  of  9/22/96  to  9/21/97,  the  three  collaborating  institutions.  University 
of  Michigan,  Georgetown  University,  and  University  of  Iowa,  have  conducted  the  following  tasks: 

(a)  Development  of  a  graphical  user  interface  (GUI)  for  the  CAD  workstation 

One  of  the  aims  of  this  project  is  to  develop  a  GUI  for  implementation  of  the  CAD  software  in  a 
workstation.  We  have  designed  a  GUI  based  on  the  need  and  requirements  for  our  project.  The 
application  should  be  as  independent  of  operating  systems  as  possible,  present  a  coinfortable  and 
customizable  user  interface,  and  provide  the  developer  with  a  reasonably  simple  means  of  incorporating 
new  analysis  tools.  By  employing  a  modified  architecture  based  upon  the  Model- View-Controller 
scheme  developed  in  the  Smalltalk  environment,  it  is  hoped  that  these  goals  can  be  met.  This 
architecture  provides  a  means  to  separate  the  window-system  dependent  operations  of  displaying  data 
and  receiving  user  commands  from  the  data  manipulation  code.  To  this  end,  it  seems  that  an  object- 
oriented  approach  using  C-H-  to  structure  programs  using  the  Xt/Motif  graphical  interface  will  be 
reasonable.  There  are  hundreds  of  user  interface  toolkits  available  that  support  a  variety  of  languages 
and  programming  styles.  However,  issues  such  as  future  availability,  acceptance  by  users  and 
programmers,  support,  and  reliability  have  affected  this  choice.  We  have  found,  via  the  World  Wide 
Web,  a  software  (developed  by  Douglas  Young)  which  provides  a  framework  that  facilitates  the  GUI 
development.  Creation  of  X  widgets,  button  interfaces  and  their  callback  registration,  incorporation  of 
the  users’  application  defaults  specifications  (colors,  fonts,  etc),  and  so  forth  have  been  encapsulated  in  a 
number  of  reliable  and  reusable  C+-H  classes  so  that  the  programmer  can  focus  more  on  the  application 
behavior  and  less  on  the  user  interface  implementation  details. 

The  current  status  of  the  program  is  described  here.  A  main  window  is  presented  which  includes 
a  menubar,  a  scrolled  display  area,  and  a  system  command  interface.  The  menubar  supports  hierarchical 
menus.  Menu  choices  are  added  simply  by  providing  command  procedures  and  adding  them  to  a  list. 
The  framework  creates  the  necessary  buttons,  registers  the  appropriate  callback  mechanism,  and 
manages  or  unmanages  the  widgets.  There  is  a  facility  to  selectively  enable  or  disable  menu  choices  as 
appropriate  to  the  current  state  of  the  application.  The  display  area  uses  a  grayscale  colormap  to  present 
images.  It  will  change  size  in  response  to  the  user  resizing  the  main  window,  present  appropriate 
portions  of  the  image  in  response  to  changes  in  scrollbar  positions,  and  refresh  the  image  when  an 
obscured  portion  is  uncovered.  The  command  interface  allows  the  user  to  issue  commands  to  the 
operating  system  from  within  the  application,  and  presents  a  history  list  of  previous  commands  that  can 
be  selected  for  reissuance. 

The  main  menu  choices  will  contain  many  functions,  including  File,  Edit,  View,  Process,  Tools, 
Window,  and  Help.  Many  image  manipulation  functions  will  be  developed  under  each  menu.  For 
example,  under  the  ‘File’  choice  are  the  following.  There  is  a  single  level  ‘undo’  that  is  active  when  a 
command  provides  this  functionality.  It  is  automatically  made  available  or  unavailable  as  appropriate. 
The  ‘open’  choice  presents  a  pop-up  file  selection  dialog  that  permits  browsing  through  filesystems  and 
the  selection  of  files.  The  ‘new’  command  creates  another  main  window  capable  of  displaying  a 
different  image  with  its  own  colormap.  Currently,  the  only  colormap  available  is'  the  grayscale, 
however,  a  facility  allowing  a  user  created  or  specified  map  will  be  included.  The  ‘close’  command 
destroys  its  main  window,  unless  it  is  the  last  one.  The  ‘submenu’  choice  demonstrates  the  hierarchical 
menu  facility  by  presenting  a  submenu  with  a  few  of  the  above  choices  included  to  prove  functionality. 
The  ‘quit’  choice  simply  exits  the  application,  destroying  all  of  its  windows,  after  user  affirmation  of 
intent.  Under  the  ‘Help’  menu  there  is  the  ‘Version’  command,  which  reports  the  revision  number  of  the 


7 


r 


program.  Under  the  'view'  menu,  there  will  be  windowing,  zooming,  panning,  and  other  image 
enhancement  tools.  Under  the  'Process'  menu,  there  will  be  all  the  CAD  software  and  image  processing 
functions.  Under  the  'Tool'  menu,  there  will  be  region  of  interest  (ROI)  selection  and  manipulation,  text 
and  graphical  overlay,  etc. 

Some  commands,  such  a  ‘Quit’  and  ‘New’  are  global  in  scope:  no  matter  how  many  main 
windows  are  open  or  which  one  is  currently  active,  they  do  the  same  thing.  Others,  such  as  ‘Open’  and 
‘Close’,  are  local  to  the  active  main  window.  It  is  possible  to  link  some  local  operations  to  operations  in 
other  main  windows,  since  the  application  framework  maintains  lists  of  the  appropriate  X  windows  data. 
This  has  been  temporarily  demonstrated  for  the  case  of  scrolling  operations  -  -  navigation  within  an 
image  in  one  window  can  be  propagated  to  all  other  active  windows.  It  remains  to  be  defined  when  this 
facility  would  be  useful. 

Presently,  only  one  image  format  can  be  read.  In  progress  is  the  definition  of  a  C++  image  class 
to  contain  the  image  data  and  the  development  of  functions  to  read  and  write  various  image  file  formats, 
including  DICOM.  This  class,  or  perhaps  one  derived  from  it,  will  include  functions  to  perform  various 
manipulations  of  the  data,  such  as  thresholding,  windowing,  histograms,  filtering,  region  of  interest 
designation,  etc.  Of  particular  concern  is  the  amount  of  data  presented  by  mammograms  relative  to 
available  system  memory.  Several  file  and  memory  management  methods  are  being  examined  with 
regard  to  efficiency  and  speed.  Promising  are  several  objects  found  in  the  Standard  Template  Library 
(STL)  supplied  with  C++. 

One  highly  desirable  addition  to  the  application  framework  will  be  an  object  that  supports  a  user- 
customizable  detachable  toolbar  that  provides  rapid  access  to  frequently  used  operations.  Another  will 
be  an  extensive,  context  sensitive,  help  utility.  Another  will  be  a  simple  password  verification  method 
capable  of  restricting  availability  of  certain  procedures  depending  on  the  user’s  status  (user  vs 
developer)  and  preference  file. 

We  will  continue  to  develop  the  GUI  for  the  CAD  workstation.  At  the  same  time,  the  CAD 
programs  for  detection  of  masses  and  microcalcifications  are  being  improved.  These  programs  will  be 
integrated  into  the  CAD  workstation  via  the  GUI  in  the  near  future. 

(b)  Conversion  of  CAD  software  from  VMS  to  UNIX  operating  system 

Up  to  last  year,  our  program  development  was  based  on  the  VMS  operating  system  from  the 
Digital  Equipment  Corporation  (DEC).  Because  UNIX  operating  system  is  more  commonly  used  in 
workstations  and  it  allows  more  flexibility  in  selecting  new  workstations,  peripherals,  and  software  for 
future  applications,  we  decided  to  migrate  to  UNIX  early  this  year.  We  have  to  convert*  all  of  our  CAD 
programs  to  run  under  UNIX.  Many  of  the  programs  have  to  be  modified  and  tested  because  there  are 
some  differences  in  the  FORTRAN  and  C  compilers  under  the  two  operating  systems.  Most  of  the 
modifications  are  relatively  minor.  However,  some  of  the  FORTRAN  programs  need  major  changes 
because  the  original  versions  incorporated  subroutine  utilities  that  are  specific  to  DEC  VMS  operating 
systems.  To-date,  the  conversion  of  the  mass  detection  programs  have  been  completed.  The  conversion 
of  the  microcalcification  detection  programs  are  near  completion.  These  programs  will  be  implemented 
in  the  CAD  workstation  and  will  be  run  with  the  GUI  for  the  pilot  clinical  trial. 

(c)  Automated  microcalcification  detection  program 
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We  have  been  modifying  the  FORTRAN  codes  to  make  the  programs  portable  between  different 
operating  systems,  as  described  above.  The  programs  will  be  interfaced  to  the  GUI  and  it  can  be 
executed  on  a  given  input  digitized  mammogram  via  menu  on  the  GUI.  It  can  also  be  set  up  so  that  a 
group  of  digitized  mammograms  can  be  processed  consecutively  in  batch  mode  overnight.  The 
automated  microcalcification  detection  program  is  one  of  the  main  components  of  the  CAD  workstation 
for  the  pilot  clinical  trial. 

(d)  Automated  mass  detection  program 

The  mass  detection  scheme  is  now  running  under  both  VMS  and  UNIX.  We  can  either  apply  the 
detection  scheme  in  training  or  test  mode.  Training  mode  allows  the  classifiers  at  each  stage  of  the 
algorithm  to  be  trained  while  test  mode  allows  a  single  image  using  a  fixed  classifiers  and  fixed 
thresholds  to  be  run  through  the  entire  detection  scheme.  The  test  mode  can  also  be  set  up  to  process  a 
set  of  digitized  mammograms  in  a  batch  mode.  The  mass  detection  code  takes  approximately  7  minutes 
per  image  using  the  current  implementation  and  is  currently  able  to  detect  90%  of  the  biopsy  proven 
masses  with  3.4  false-positives  (FPs)  per  image  or  80%  of  the  masses  with  1.4  FPs  per  image  based  on 
255  test  images  containing  a  single  biopsy  proven  mass.  The  mass  detection  is  currently  implemented 
using  density-weighted  contrast  enhancement  (DWCE)  detection  in  the  first  stage  and  then  refining  the 
mass  borders  using  gray-scale  and  gradient  based  region  growing.  Between  each  stage  of  the  algorithm, 
morphological  FP  reduction  using  individual  feature  thesholding  followed  by  linear  discriminant 
classification  is  used.  In  the  final  step  of  the  algorithm,  texture  features  for  each  of  the  detected  objects 
are  calculated  and  used  in  a  linear  classifier  to  further  differentiate  between  breast  masses  and  normal 
structures. 

We  are  currently  investigate  the  use  of  fuzzy  classifiers  to  improve  both  the  morphological  and 
feature  classification.  In  addition,  new  morphological  feature  are  under  investigation  to  further 
differentiate  normal  structures  from  breast  masses.  We  are  in  the  process  of  testing  the  current  detection 
scheme  to  an  unknown  image  set.  This  independent  test  set  will  be  used  to  gauge  the  robustness  of  our 
current  implementation. 

Accurate  delineation  of  mass  boundaries  is  an  important  step  for  the  differentiation  of  true-  and 
false-positives  in  an  automated  mass  detection  program.  We  evaluated  both  qualitatively  and 
quantitatively  the  accuracy  of  two  automated  techniques  for  the  segmentation  of  mass  boundaries  from 
regions  of  interest  (ROIs).  One  hundred  and  fifteen  mammograms  with  biopsy-proven  masses  were 
acquired  from  58  patient  files.  The  mammograms  were  digitized  at  100  micron  resolution.  For  each 
image,  a  fixed  size  ROI  containing  a  mass  was  extracted.  Two  automated  segmentation  algorithms,  one 
based  on  an  adaptive  clustering  method  using  the  gray  level,  edge  gradient,  and  median-filtered  images 
of  the  ROI  and  another  based  on  statistical  modeling  of  the  ROI  as  a  Gauss-Markov  random  field 
(GMRF),  were  used  to  extract  mass  boundaries.  These  boundaries  were  compared  with  manually 
segmented  boundaries  for  41  masses.  Two  quantitative  measures,  the  mass  fraction  ratio  index  (MFRI) 
(the  ratio  of  the  intersection  between  the  computer-segmented  mass  area  and  the  manually  segmented 
mass  area  to  the  computer-segmented  mass  area)  and  the  percentage  difference  in  the  average  radial 
length,  extracted  from  the  segmented  mass  were  compared.  A  qualitative  evaluation  of  83  masses  was 
also  performed  in  which  5  observers  rated  the  similarity  of  the  two  computer-segmented  mass 
boundaries  to  the  pereeived  mass  boundary  on  a  10-point  scale.  The  MFRI  value  for  the  algorithm 
based  on  the  clustering  was  0.95  and  for  that  based  on  the  GMRF  model  was  0.92.  The  mean 
differences  in  the  average  radial  length  feature  were  13  and  20,  respectively.  The  qualitative  observer 
study  indicated  that  the  GMRF  algorithm  was  marginally  (2.5%)  better  than  the  elustering  algorithm. 
Our  quantitative  and  qualitative  evaluation  indicates  that  the  segmented  masses  obtained  from  the 
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automated  algorithms  are  reasonably  close  to  those  from  manual  segmentation.  Morphological  feature 
values  obtained  from  automated  and  manual  segmentations  were  comparable. 

These  techniques  improve  the  mass  segmentation  from  the  current  technique  used  in  our  mass 
detection  programs  that  are  based  on  thresholding  and  edge  detection.  We  will  incorporate  the  new 
segmentation  techniques  into  our  mass  detection  programs.  The  morphological  features  derived  from 
the  masses  segmented  with  the  improved  method  are  expected  to  be  more  effective  for  reducing  false- 
positives. 

(e)  CAD-guided  data  compression 

We  are  developing  data  compression  method  for  compressing  digitized  mammograms.  Because 
of  the  high  spatial  resolution  and  high  image  fidelity  required  for  mammograms,  image  compression 
methods  specifically  tailored  to  mammographic  images  are  needed.  The  researchers  in  the  Georgetown 
University  have  been  investigating  new  image  compression  methods  for  mammograms.  The  following 
summaries  their  progresses. 

1.  Dyadic  Data  Decomposition 

1.1  A  Unification  System 


A  decomposition  method  (H+GP)  generalized  from  Haar  transform  has  been  derived.  This 
general  form  can  exactly  describe  dyadic  doublet-type  transforms  such  as  orthogonal  waVelets.  Another 
general  form  (B+GP)  based  on  the  binomial  filter  can  describe  dyadic  triplet-type  transforms  such  as 
biorthogonal  wavelets.  Although  the  doublets  and  triplets  seem  to  function  independently,  they  share 
exactly  the  same  decomposition  principles.  We  can  integrate  major  dyadic  decomposition  methods 
through  a  unified  view.  Figure  1  shows  the  unified  perspective  of  dyadic  decomposition  systems  we 
discover. 


Dyadic 

Decomposition 

Systems 

Singlet  Basis 
Systems 

Doublet  Basis 
Systems 

Triplet  Basis 

Systems 

Basis 

H!h 

JLL  -L^ 

Hierarchical 
Structure  of 
the  systems 

(From  a 
general  form 
to  subsets) 

D+GP 

1 - ^ - , 

D+P  H+GP  B4GP 

(Sd+P)  poublet  (Triplet 

Systems)  Systems) 

•  • 

•  • 

Singlet  decomposition 

H+GP 

1 

Doublet-2-band 

H+P  Orthogonal 

(S+P)  wavelets 

Haar  wavelet 

B+GP 

1 

Triplet-2-band 

1 

1 - ^ - 1 

B+P  Biorthogonal 
(Sb+P)  wavelets 

Some  3-m'-tap 
wavelets 

Figure  1.  A  diagram  illustrates  the  unification  of  major  dyadic  decomposition  systems. 

The  doublet-  and  triplet-types  of  dyadic  transformation  systems  can  be  unified  by  the  delta 
function  basis  (D+GP)  decomposition  system  as  following: 
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D+GP(delta+generalized  prediction) 


lowpass: 

and 

^”n-^n  +  ^''(X2n±i) 

(1) 

highpass: 

t''n=  ^n+1  +^"(^2n+l±«) 

(2) 

where  a"  and  e"  are  the  added  process  and  prediction  terms,  respectively.  The  well-known  DPCM  is  a 
special  case  of  the  D+GP  transform  by  giving  a"(.)=  -X2n-i  and  e"(.)=  -X2n-  The  doublet  and  triple 
decomposition  systems  are: 


H+GP(Haar-t-generalized  prediction)  B+GP(binormal+generalized  prediction) 


lowpass: 

and 

t  =  (^2n+l+^2n)/2  +  a(X2„i,.) 

^  n  “  (^2n-l  ^2n+l )  /  4  H-  a'  ) 

(3) 

highpass: 

=(X2„+1  -X2„)  +  e(x2„+,) 

t'„  =  (2x2„-  X2„_i  -  X2„+i)  +  e'  (X2„±i) 

(4) 

1 .2  Integer  Implemenation  of  Wavelet  transform 

Based  on  the  above  theoretical  development,  we  found  that  the  general  form  can  be  used  as  a 
bridge  for  various  dyadic  decompositions.  We  are  also  successfully  link  the  integer-based  S  transform 
and  wavelet  transform.  A  brief  theory  is  given  below: 

For  a  set  of  digital  data  (i.e.,  an  integer  data  sequence),  Haar  transform  [25]  can  be  approximated 
by  Sequential  (S)  transform  [26].  Basically,  S  transform  computes  the  average  and  the  difference  of  the 
two  adjacent  elements  of  the  integer  data  sequence.  More  specifically,  the  former  is  the  truncated 
integer  of  the  average  value.  For  a  given  data  sequence  X:  (x,- ,  i  =  0, ..  A  - 1) ,  the  integer  form  of  Haar 
transform  (i.e.,  S  transform)  can  be  written  as: 

=  +^2„+i)/2j  (5) 

und  d„  —  X2„+]  “  ,  '  (6) 

where  [.J  stands  for  a  truncation  operation  of  a  value.  The  corresponding  inverse  operations  are: 

^  n+l  =  +  \idn  +  0  ^  2  J  (7) 

and  =  (8) 

To  further  reduce  the  first-order  entropy  of  the  data  sequence,  we  may  add  a  prediction  term,  en, 
in  the  high  pass  domain  for  Eq.  (2). 

~  (•^2n+l  “  ^2n) + 1  / 2 J  =  + 1  / 2j .  (9) 

The  addition  of  1/2  and  the  truncation  operation  serves  for  integer  implementation,  otherwise  they  can 
be  neglected.  If  the  predictive  values  are  statistically  close  to  the  values  of  -dn,  the  reduction  of  the  first- 
order  entropy  is  achieved.  The  wavelet  decomposition  follows  the  similar  concept  of  data  prediction  to 
reduce  the  entropy  of  the  data  sequence.  In  fact,  we  can  use  the  adjacent  decomposed  Haar  coefficients 
(i.e.,  bn  and  dn)  to  form  an  orthogonal  wavelet  transform  through  a  prediction  indicated  in  Eq.  (9).  First, 
let  the  prediction  term  be  linearly  composed  by  the  adjacent  decomposed  coefficients,  such  that 


11 


<  E 


(10) 

I  j 

where  a.  and  Pj  are  the  contribution  factors  of  the  decomposed  Haar  coefficients  accordingly.  Secondly, 
substitute  Eq.  (10)  into  (9)  and  let  Eq.  (9)  equal  to  a  scaled  wavelet  transform  with  a  high  pass 
convolution  kernel  By  omitting  the  convolution  operations  with  the  data  vector  x 

for  the  both  sides  of  the  equation,  we  have 

( . )xa/ 

+(...  4,  0,  0)xa_j 

+(...  0,  0,  ^)xao 

+( . )x/5,  (11) 

+(...-1,  1,  0,  0)xp_, 

+(...  0,  0,-1,  l)xp, 

,/l2 

where  bn=(--  0,  0,  |)0x  ,  Vl  =(•••  0.  0)®x,  4  =  (...  0,  0,-1,  l)03c, 

=  (•■•— 1,  1,  0,  O)0x,  and  so  on.  In  addition,  C  is  the  offset  scaled  factor  between  the  two 
transform  systems  and  fig  should  be  unity  because  the  last  term  of  the  top  form  represents  df^ ..  The  rest 
of  the  contribution  factors  (i.e.,  a,,  and  Pj)  as  well  as  C  value  can  be  solved  in  terms  of  the  filter 
coefficients  (i.e.,  hp  of  an  m-tap  transformation.  Specifically: 

C  =  2/(h,  +  hi). 

«o  =  2(/io  -hi)/ (ho  +  hi)  and  l3o  =  1 , 

-  2(^2i  “  h2i+i )/(ho+h[)  and  Pj  =  (h2j  +  )Kho+hi).  (12) 


A  similar  derivation  can  be  extended  to  any  2-band  filtering  system.  Based  on  Eq.  (12),  the 
decomposition  coefficient  is  exactly  the  same  as  the  high  frequency  coefficient  through  2-band 

transform  multiplied  by  the  constant  C.  From  Eq.  (12),  we  can  easily  find  that  S  «,•  =  CL(-)%  •  Since 

i  i 

the  property  of  zero-mean  filtering  is  maintained  (i.e.,  S (-)*/%  =0)  in  an  orthogonal  wavelet  or  a  2- 

i 

band  filtering  system,  must  vanish  to  match  the  case  [27].  The  above  derivation  indicates  any 

i 

orthogonal  wavelet  transform  can  be  made  by  Haar  Transform  through  prediction.  In  addition,  the 
integer  implementation,  such  that  shown  in  Eq.  (9),  can  be  made  by  (i)  downward  truncation  and  (ii)  use 
of  approximated  rational  values  of  and  Pj.  Since  Eq.  (9)  can  always  be  reversible, 'the  accuracy  of 

a,,  and  Pj  values  is  not  a  concern  as  long  as  they  are  used  consistently  in  both  forward  and  inverse 
transforms. 

For  biorthnormal  wavelets,  the  binomial  decomposition  would  be  the  basis  [28]  and  will  also  be 
studied.  The  transformed  coefficients  will  then  be  encoded  by  embedded-zero-tree  (EZW)  [29]  or 
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partitioning  in  hierarchical  trees  (PHT)  [30].  Our  recent  studies  indicated  that  both  cnHinp  methods 
result  in  similar  compression  ratios.  However.  PHT  seems  to  be  computationally  efficient. 

1.3  Preliminary  Study  of  the  Wavelet  Transforms  using  Real  and  Integer  Value  Computation 

We  have  done  several  preliminary  studies  to  prove  that  our  algorithms  are  practical.  Five  CT 
and  five  mammograms  were  used  to  perform  the  forward  and  inverse  transforms  using  regular  dyadic 
wavelet  decomposition  as  well  as  the  integer  implementation  described  above.  Our  study  indicated  that 
all  images  were  fully  reconstructed  with  two-dimensional  transformation.  Figure  2  shows  an  original 
CT  profile.  (Note  profiles  of  mammogram  are  realtive  shallow  and  are  difficult  to  show  the  small 
differences  in  decomposed  data).  The  decomposed  profile  based  on  regular  wavelet  transform  (1-D)  is 
shown  in  Figure  3.  Its  counterpart  based  on  the  integer  version  of  the  wavelet  transform  is  shown  in 
Figure  4.  The  low  pass  coefficients  are  almost  the  same.  The  main  difference  between  the  two 
decomposed  high  pass  profiles  appears  at  the  sharp  edge  areas.  This  is  because  large  values  accentnatp. 
the  computational  differences  between  real  and  integer  calculation.  In  fact,  more  number  of  small 
coefficients  are  found  in  the  integer  version  of  the  profile.  The  first  order  entropy  is  ^in  the  high  pass 
profile  using  real  computation.  The  first  order  entropy  is  5.7  in  the  high  pass  profile  in  Figure  4.  The 
excessive  entropy  results  from  the  differences  between  the  original  and  quantized  profiles  computed 
through  real  values.  The  quantization  procedure  is  necessary  for  any  real  values  to  be  encoded  in  a 
compression  scheme. 


Figure  2.  An  original  CT  profile. 
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Figure  3.  Single  level  data  decomposition  through  the  Daubechies’  D8  wavelet  filter.  -The  left  profile 
shows  the  decomposed  low  pass  coefficients.  The  right  profile  is  the  decomposed  high  pass  coefficients 
withC=  2.115899710319. 


Figure  4.  Single  level  data  decomposition  through  the  integer  implementation  of  Daubechies'  D8  wavelet. 
The  left  profile  shows  the  decomposed  low  pass  coefficients.  The  right  profile  is  the  decomposed  high 
pass  coefficients. 


2.  On  Optimzation  of  Wavelet  Kernels  for  Mammography  Image  Compression 

A  neural  network  based  method  has  been  developed  to  search  for  optimal  wavelet  kernels  which 
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can  produce  the  most  favorable  set  of  transform  coefficients  to  preserve  data  accuracy  and/or  defined 
image  features  during  the  compression.  In  this  study,  our  technical  achievements  were:  (a)  development 
of  a  unified  method  to  facilitate  multichannel  wavelet  decomposition;  (b)  designing  a  cost  (error) 
function  consisting  of  MSE  and  imposed  entropy  reduction  function  for  training  the  convolution  neural 

network;  and  (c)  converting  neural  network  suggested  kernel  into  a  filter  constrained  by  the  wavelet 
requirements. 


In  all  medical  image  modalities  we  have  tested  so  far  (including  mammography,  CT,  MRI), 
Daubechies'  wavelet  (or  its  nearby  wavelets)  generally  performs  better  (in  most  cases  slightly  better) 
than  other  wavelts  for  image  compression  using  a  global  measure.  With  a  large  quantization  factor, 
Haars  wavelet  produces  the  lowest  and  highest  mean-square-errors  for  the  background  and 
microcalcification  profile  areas,  respectively.  However,  Daubechies'  wavelet  produces  an  opposite 
result.  In  addition,  we  found  that  the  wavelet  associated  with  a  low-pass  filter  (0.32252136 
0.85258927,  0.38458542,  -0.14548269),  possesses  the  highest  feature  preservation  capability  in 
microcalcification  peak,  contrast,  and  signal  to  noise  ratio.  Through  this  study,  we  also  found  that  only 
Haar  s  wavelet  sometimes  produced  a  dramatic  result,  usually  optimization  occurs  on  a  band  of  wavelets 
not  at  a  single  wavelet  [31]. 

We,  therefore,  conclude  that  Daubechies'  wavelet  (and  its  nearby  wavelets)  is  generally 
applicable  for  image  compression.  However,  Haar’s  wavelet  is  suitable  for  low-noise  smooth  areas  and 
sharp  edges.  For  a  specific  image  pattern  such  as  microcalcifications  on  mammograms,  one  might  find  a 
wavelet  filter  can  most  preserve  the  features. 


3.  Mammography  Image  Compression  Using  Wavelet  Transform  via  Integer  Computation 

We  have  developed  a  research  software  package  that  is  capable  of  compression  any  image  with  a 
lossless  or  lossy  result  controllable  by  the  user.  In  addition,  any  wavelet  can  be  approximated  by  its 
associated  integer  implementation  in  this  software  as  described  in  section  1.2.  This  research  package 
allows  the  user  to  select  desired  wavelet.  Both  lossless  and  lossy  compression  were  studied  on  3  sets  of 
mammograms  (4  images  per  set:  MLO  and  CC  views  with  left  and  right  mammograms  as  a  set)  which 
were  digitized  with  100  pm  and  12  bits  per  pixel  (i.e.,  ranging  10  Mbytes  -  18  Mbytes  per  image 
originally).  We  chose  a  large,  a  medium,  and  a  small  breasts  for  each  set  in  our  study. 

3.1.  Lossless  Compression  Study 


We  roughly  trimmed  the  air  region  and  the  boundary  of  the  films  from  the  originally  digitized 
mammograms.  We  then  performed  a  lossless  compression  study  using  Daubechies'  wavelet  transform 
with  integer  computation  method.  The  results  are  shown  in  Table  I. 


....  average  compression  ratio  was  about  33  based  on  the  trimmed  and  12 

bit/pixel  image.  This  result  was  dramatically  increased  to  H  for  medium  and  large  breast  images  and  to 
^  for  small  breast  mammograms  when  using  the  originally  digitized  image  as  bases.  This  results  can 
further  increased  by  a  factor  of  1.5  when  we  mask  out  the  air  area  in  the  trimmed  images  in  our 
computer  software.  The  results  of  this  study  will  be  given  in  the  report  of  this  project  next  ye^ . 
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Table  I.  Compression  ratios  based  on  Daubechies'  wavelet  transform  with  integer  computation 


Breast  Type 

Large  Breast 

Medium  Breast 

Small  Breast 

View 

CC  MLO 

CC  MLO 

CC  MLO 

Orginal 

Data  Size 

2,560x3,584x2bytes 
(18,350,080  bytes) 

2,048x2,560x2bytes 
(10,485,760  bytes) 

2,048x2,5 60x2by  tes 
(10,485,760  bytes) 

Roughly  Trimmed 
Data  Size 

1, 500x2, 500x 
1,5  bytes 
(5,625,000) 

l,500x2,500x 

1 .5  bytes 
(5,625,000) 

l,024x2,048x 
1.5  bytes 
(3,145,728) 

l,024x2,048x 
1.5  bytes 
(3,145,728) 

512xl,024x 

1.5  bytes 
(786,432) 

.  520x1, 024x 
1.5  bytes 
(786,432) 

Compressed  Data 
Size  using  the 
Lossless  Method 

1,608,316 

bytes 

1,724,551 

bytes 

945,675  bytes 

993,412  bytes 

239,776  bytes 

250,343  bytes 

Compression  Ratio 
to  Trimmed  Data 

3.50 

3.26 

3.33 

3.17 

3.28 

3.14 

Compression 

Ratio  to  original 

11.41 

10.64 

11.09 

10.56 

43.73 

41.89 

3.2.  Lossy  Compression  Study 

The  beauty  of  the  integer  wavelet  compression  method  is  that  the  lossless  and  lossy  method  can 
be  implemented  on  the  same  transform  domain.  The  transformed  domain  coefficients  used  in  the  above 
lossless  compression  study  were  further  processed  by  a  quantization  procedure  (i.e.,  the  values  were 
uniformed  divided  by  a  factor  in  each  level  of  wavelet  compartments)  followed  by  a  arithmetic  coding. 
The  decompressed  images  were  compared  to  the  original  images  to  obtain  the  normalized  mean-square- 
errors  (NMSE).  The  results  are  shown  in  Table  II. 

Table  H.  Compression  ratios  and  NMSE  values  based  on  Daubechies'  wavelet  transform  with  integer 
computation 


Breast  Type 

Large  Breast 

Medium  Breast 

Small  Breast 

View 

CC  MLO 

CC  MLO 

CC  MLO 

Orginal 

Data  Size 

2,560x3 ,5 84x2bytes 
(18,350,080  bytes) 

2,048x2,560x2bytes 
(10,485,760  bytes) 

2,048x2,560x2bytes 
(10,485,760  bytes) 

Roughly  Trimmed 
Data  Size 

l,500x2,500x 
1.5  bytes 
(5,625,000) 

1, 500x2, 500x 
1.5  bytes 
(5,625,000) 

l,024x2,048x 
1.5  bytes 
(3,145,728) 

l,024x2,048x 
1.5  bytes 
(3,145,728) 

512x1, 024x 

1.5  bytes 
(786,432) 

520xl,024x 

1.5  bytes 
^786,432) 

NMSE 

(compression 

ratios) 

NMSE=3.3 

CR  =  2 

CR'=  6.52 

NMSE=3.3 

CR  =  2 

CR’:=  6.52 

NMSE=3.5 

CR  =  2 

CR’=  6.67 

NMSE=3.4 

CR  =  2 

CR'=  6.67 

NMSE=3.5 

CR  =  2 

CR'=  6.67 

NMSE=3.4 
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where  CR  represents  the  compression  ratio  with  respect  to  the  trimmed  image  and  CR'  stands  for  the 
compression  ratio  with  respect  to  the  originally  digitized  mammograms.  All  24  decompressed  images 
do  not  show  observable  differences  from  the  original  mammograms  including  microcalcifications  in  our 
preliminary  viewing  on  a  monitor.  Further  studies  will  be  performed  with  CAD-guided  lossless 
compression  for  the  suspicious  microcalcifiction  and  mass  areas  and  the  results  will  be  reported  next 
year. 

(f)  Planning  of  pilot  clinical  trial 

Dr.  Dorfman,  Dr.  Berbaum,  and  Dr.  Lenth  at  the  University  of  Iowa  have  been  leading  the  efforts 
in  planning  of  the  pilot  clinical  trial  and  developing  methods  for  statistical  analysis  of  the  data  to  be 
collected.  The  data  in  this  research  project  can  be  conceptualized  in  the  following  way:  Patients  and 
readers  will  be  crossed  with  treatments,  and  patients  will  be  nested  within  readers.  In  brief,  readers  will 
participate  in  both  treatments,  decision  without  CADx  and  decision  with  CADx.  It  is  assumed  that 
readers  and  patients  are  a  random  sample  from  their  respective  populations.  The  factors  are: 

Treatments:  2,  fixed  effect  (Control,  CADx), 

Readers:  9,  random  effect. 

Patients:  5000  (30  cancers  expected),  random  effect  nested  within  readers. 

Although  our  multireader,  multipatient  algorithms  have  been  used  previously  to  analyze  data  from 
experiments  with  designs  similar  to  that  of  the  proposed  studies  [32],  a  major  difference  is  that  patients 
in  the  current  studies  are  nested  within  readers  rather  than  crossed  over  readers.  This  design  is  often 
referred  to  as  a  split-plot  design  in  the  literature.  The  linear  model  for  the  research  is: 

Pseudovalue  =  fl  +  Reader  +  Patient(Reader)  +  Treatment  +Treatment  X  Reader  +  Residual 

where  Patient(Reader)  denotes  patients  nested  within  readers,  and  'x'  denotes  interaction.  During  this 
last  year,  we  have  extended  our  methodology  to  this  split-plot  design.  Simulations  were  performed  on 
both  the  fully  crossed  factorial  design  and  the  split-plot  design.  Two  papers  are  currently  being  prepared 
for  publication  that  describe  the  results  of  these  simulations  (see  Appendix). 

Because  the  prevalence  of  breast  cancer  in  a  screening  population  is  low  (about  6  per  1000),  the 
receiver  operating  characteristic  (ROC)  rating  data  will  be  too  sparse  for  traditional  binormal  analysis. 
Sparse  data  lead  to  degenerate  ROC  fits  that  sometimes  cross  the  chance  line.  We  have  developed  and 
evaluated  a  proper  constant-shape  bigamma  model  to  handle  binormal  [33].  Monte  Carlo  samples  were 
generated  both  from  a  standard  binormal  population  model,  and  from  a  proper  constant-shape  bigamma 
model  in  a  series  of  Monte  Carlo  studies.  The  results  confirm  Hanley’s  proposition  that  the  standard 
binormal  model  is  robust  in  large  samples  with  no  degenerate  datasets  and  Metz’s  proposition  that  the 
standard  binormal  model  is  not  robust  in  small  samples  because  of  degenerate  datasets. 

A  proper  constant-shape  bigamma  model  seems  to  solve  the  problem  of  degeneracy  without 
inappropriate  chance  line  crossings.  The  bigamma  fitting  model  outperformed  the  standard  binormal 
fitting  model  in  small  samples,  and  gave  similar  results  in  large  samples.  We  are  currently 
implementing  a  computer  program  that  will  permit  us  to  analyze  the  multireader,  multipatient  data 
collected  in  this  research  project  using  these  new  methods. 


17 


1.  •k/' 


(7)  Conclusions 

We  have  been  developing  the  CAD  workstation  and  preparing  for  the  pilot  clinical  study  in  this 
year.  Except  for  the  slight  delay  in  the  GUI  development  (described  below),  the  project  is  progressing  as 
planned  in  the  different  subprojects.  We  have  migrated  to  the  UNIX  operating  system.  The  mass 
detection  program  has  been  improved  and  compared  with  our  previous  results.  The  wavelet  data 
compression  techniques  are  being  investigated,  and  preliminary  planning  of  the  pilot  clinical  study  has 
been  completed. 

We  have  some  difficulties  in  recruiting  a  computer  programmer  to  develop  the  GUI.  Because  of 
the  limited  budget  for  this  position,  it  was  very  difficult  to  compete  for  an  experienced  programmer  due 
to  the  high  demand  for  expertise  in  this  area  in  the  industry.  The  first  programmer  whom  we  offered  the 
position  accepted  another  position  in  a  commercial  firm  after  we  waited  for  two  months.  A  second 
programmer  also  used  our  position  for  a  transition  time  and  returned  to  the  industry  after  about  one 
month  on  the  job.  The  current  programmer  initially  lacked  experience  in  C++  and  GUI.  However,  he 
was  reliable  and  hard-working  and  was  able  to  learn  these  necessary  skills  after  a  relatively  short  time. 
He  is  now  well  on  his  way  to  develop  the  GUI.  This  recruitment  problem  caused  some  delay  in  the 
development  of  the  CAD  workstation.  We  plan  to  catch  up  with  the  scheduled  work  as  soon  as  possible. 
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